Textual Data Mining through the Synergistic Combination of Classifiers and Linguistic Processors

نویسنده

  • Sylvain Delisle
چکیده

Numerical data mining tools are generally quite robust but only provide coarse-granularity results; such tools can handle very large inputs. Computational linguistic tools are able to provide fine-granularity results but are less robust; such tools, often semi-automatic, usually handle relatively short inputs. A synergistic combination of both types of tools is the basis of our hybrid approach. First, a connectionist classifier is used to locate potentially interesting documents, or segments thereof. Second, the user selects segments that will be forwarded to the linguistic processor in order to semi-automatically analyse their textual data and extract relevant information or knowledge elements. We present the main characteristics of our hybrid approach to textual data mining, plus a methodology by which it can be put to use. We also report on the results of a first evaluation involving a corpus made up of two texts pertaining to two

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Textual Enhancement across Linguistic Structures: EFL Learners' Acquisition of English Forms

The benefits of textual input enhancement in the acquisition of linguistic forms have produced mixed results in SLA literature. The present study investigates the effects of textual enhancement on adult foreign language intake of two English linguistic forms-subjunctive mood and inversion structures-to explore the role of the type of linguistic items in input enhancement studies. It also invest...

متن کامل

Investigating Discourse Socialisation Progress of an English as a Second Language Learner Using Systematic Functional Linguistic Approach

This study was framed on the theory of Language Socialisation and a Systematic Functional Linguistic (SFL) approach. The aim of the study was to analyse the oral presentation discourse produced by an elemen- tary Iranian English as Second Language (ESL) postgraduate student in an American university four times (September/December, 2015 and March/September, 2016) over one year. The data were col...

متن کامل

Emotion Modeling from Writer/Reader Perspectives Using a Microblog Dataset

Most recent studies on emotion analysis and detection focus on how writers express their emotions through textual information. In this paper, we model emotion generation on the Plurk microblogging platform from both writer and reader perspectives. Support Vector Machine (SVM)-based classifiers are used for emotion prediction. To better model emotion generation on such a social network, three ty...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999